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Considerable  effort  has  been,  and  is  :;ntinuing  to  be  spent  developing 
numerical  data  fitting  methods.  In  view  of  our  increasing  ability  to 
collect  immense  and  diverse  sets  of  data,  the  fact  that  data  obtained  J 
from  an  experiment  or  process  is  usually  not  in  a convenient  form  for 
Imnediate  interpretation  and  that  frequently  the  data  is  noisy  due  to 
Instrumental  or  human  error,  continued  research  on  this  topic  is  essen- 
tial. This  continued  research  should  lead  to  both  a deeper  understanding 
of  the  problems  associated  with  numerical  data  fitting  and  to  the  develop 
ment  of  numerical  data  fitting  software  packages.  The  complexity  of  this 
area  of  research  is  directly  related  to  the  complexity  and  variety  of 
data  sets  that  can  be  encountered. 


Thus,  there  is  a need  for  numerical  data  fitting  software  packages  which 
allow  for  effective  data  reduction  of  massive  data  sets  for  computational 
manipulation,  for  interpolation,  for  the  inclusion  of  specific  constraints 
(e.g.  monotoneness)  that  the  mathematical  representation  must  satisfy, 
etc.  To  date  there  exists  no  single  software  package  capable  of  processing 
any  given  data  set  in  even  the  one-dimensional  setting.  In  fact,  the 
development  of  such  an  all-purpose  black  box  is  probably  not  feasible 
today  nor  in  the  near  future.  In  this  regard  we  agree  with  the  comment 
made  by  J.  R.  Rice  [18]  that  it  is  irrational  to  expect  one  algorithm  to 
excel  in  all  respects  as  a general  purpose  curve  fitting  routine.  At 
present,  it  should  probably  be  required  that  all  existing  packages  be 
prefaced  with  a caution  that  the  particular  package  may  fail  to  return  an 
acceptable  fit  to  certain  data  sets.  For  example,  in  the  case  of  one- 
dimensional sparse  data,  a given  package  may  fit  the  data  to  within  any 
tolerance  desired  by  the  user  but  exhibit  gross  oscillatory  behavior 
between  the  data  points.  One  approach  that  might  avoid  many  of  these 
difficulties  is  to  use  an  interactive  data  fitting  package  with  graphics 
capabilities.  With  such  capabilities,  the  preformance  of  the  package  on  a 
given  data  set  could  be  monitored  during  use  or  at  least  upon  completion 
(possibly  only  when  failure  is  suspected  or  observed).  Better  yet,  such  a 
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1.  Introduction 
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package  could  allow  for  Intelligent  guidance  during  or  prior  to  the 
numerical  processing  of  the  data.  i 

In  this  paper,  we  wish  to  primarily  discuss  our  efforts  to  develop  adap- 
tive numerical  software  for  data  fitting.  Thus  our  aim  here  Is  not  to 
attempt  a survey  of  current  numerical  software  for  data  fitting. 

i 

He  shall  divide  our  discussion  into  two  sections.  In  the  first  section  we 
shall  discuss  one-dimensional  data  fitting  and  in  the  second  section  we 
shall  discuss  multi -dimensional  data  fitting.  For  the  one- dimensional 
setting,  we  have  developed  four  adaptive  numerical  software  packages  [1, 
10,  11].  We  are  now  beginning  a study  focusing  on  numerical  testing  and 
comparing  these  packages  and  other  packages  currently  available  for 
fitting  one-dimensional  data.  Here  we  would  like  to  be  able  to  classify 
or  characterize  properties  of  a given  data  set  which  makes  a particular 
package  superior  or  inferior.  Features  that  lead  to  successful  packages 
should  possibly  be  incorporated  into  the  more  effective  packages  where 
feasible  and  development  of  interactive  packages  should  be  pursued  for 
those  packages  that  appear  to  be  most  effective.  In  addition,  a survey  of 
current  numerical  software  curve  fitting  packages  together  with  reporting 
actual  experience  in  using  the  packages  could  be  very  useful. 

For  the  multi -dimensional  setting,  efforts  are  currently  underway  to 
develop  an  adaptive  numerical  software  package  for  fitting  multi- 
dimensional data  sets.  This  package  is  of  a slightly  different  philosophy 
than  the  one-dimensional  codes  mentioned  above  and  is  based  upon  the 
least  squares  approximation  operator.  It  is  anticipated  that  this  package 
will  be  available  in  the  late  spring  of  this  year. 

2.  One-Dimensional  Data  Fitting 

In  the  last  few  years,  considerable  interest  has  been  directed  towards 
developing  one-dimensional  data  fitting  techniques  [1,3,4,5,6,7,10,11,12, 
13,14,15,17,18,19].  This  activity  has  resulted  in  the  development  of 
algorithms  intended  for  interactive  use  [3,6,13]  and,  more  recently, 
adaptive  curve  fitting  packages  that  produce  piecewise  polynomial  fits 
have  appeared  [1,10,11,12,17].  These  latter  software  packages  are  adaptive 
In  the  sense  that  the  knots  or  joining  points  of  the  individual  polynomial 
pieces  are  automatically  inserted  by  the  code  as  needed. 
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The  four  adaptive  numerical  software  curve  fitting  packages  described  In 
[1,10,11]  all  utilize  the  same  adaptive  strategy.  These  packages  differ 
in  the  mode  of  approximation  that  is  used  to  determine  each  piece  of  the 
piecewise  polynomial  fit  to  the  data.  The  four  modes  of  approximation  are: 
(1)  uniform  approximation  [10],  (11)  least  squares  approximation  [1,10], 
(111)  uniform  restricted  range  approximation  [11]  and  (1  v)  fc-j  approxima- 
tion [1].  For  each  of  these  packages,  the  user  provides  the  data,  the 
degree  of  the  polynomial  pieces  desired,  the  overall  smoothness  the  final 
piecewise  polynomial  must  exhibit  on  some  interval  containing  the  Indepen- 
dent domain  of  the  data  and  a tolerance  which  is  used  to  determine  when 
.the  desired  accuracy  of  fit  Is  achieved.  In  addition,  for  the  uniform 
restricted  range  approximation  code  the  user  may  specify  a range  of  values 
(In  the  ordinate  direction)  about  each  data  point  through  which  the  fit 
■ust  pass.  (I.e.  At  each  point  (xpy^)  the  user  may  specify  values 
if  <.y.|  £ Up  < Up  for  which  the  computed  piecewise  polynomial  fit, 
p(x),  will  satisfy  ^ £ p(x^)  <_  Up)  If  the  user  does  not  specify  these 
values  then  they  will  be  set  automatically  by  the  code  using  a strategy 
that  roughly  allows  these  values  to  differ  in  a manner  proportional  to 
the  variation  of  the  data. 

In  what  follows  let  us  assume  that  we  wish  to  approximate  the  data 
t(xp  yi )>”«!•  Set  X * {x^.p  f(x^)  3 yp  1 » 1,  ....  M,  a=min{x:x  € X} 
and  b ■ max{x:  x e X>.  Further,  assume  that  one  wishes  to  approximate 
this  data  with  an  error  tolerance  of  at  most  e by  piecewise  polynomials 
of  degree  n - 1 and  overall  smoothness  on  [a,  b]  of  s,  s < n - 1.  In  this 
setting,  each  of  these  codes  will  calculate  an  approximation,  p,  to  the 
data  and  a set  of  knots  (t^f.j  C X with  a » tQ  < ^ < ...  < tk  - b such 
that  p restricted  to  [tpp  t^]  is  a polynomial  p^  € nn_j  * {q:  q is  a 
real  algebraic  polynomial  of  degree  £n  - 1},  p has  s continuous  deriva- 
tives on  [a,  b]  and  max{|f(x)  - p(x)|:  x € X}  £ e.  (For  the  restricted 
range  code  we  will  also  have  that  £ p(x^)  £ u^  for  1 * 1,  ...,  M.) 

Each  code  begins  by  calculating  tp  The  calculation  of  the  knots,  tp  is 
the  adaptive  feature  of  these  codes  and  this  is  done  in  a left-to-right 
manner  using  a bisection  strategy.  Basically,  the  code  first  calculates 
tj  to  be  essentially  the  largest  point  in  X such  that  the  best  approxima- 
tion (In  the  mode  of  the  particular  code  being  used),  pp  to  f on 
Sj  - [a,  t^OX  satisfies  max{|f(x)  - p1  (x)| : x6S1>£e.  This  Is  done 
by  first  setting  a * a and  c » b ■ b and  finding  the  best  approximation. 
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p<l«  to  f on  [a,  c]  n X.  If  max{|f(x)  - p-|(x)|:  x e [a,  c]  fl  X}  <.  e then 
p-j  Is  the  desired  approximation  and  It  is  returned  and  the  algorithm 
terminates.  If  this  Is  not  the  case,  then  a new  ce  X Is  chosen  where  c 
Is  a closest  point  In  X to  j(a  + ft),  taken  to  be  the  larger  point  when 
there  are  two  closest  points  In  X to  this  value.  Next,  the  best  approxi- 
mation to  f on  [a,  c]  H X is  calculated  and  the  maximum  absolute  pointwise 
error  on  [a,  c]  n X is  found.  If  this  error  Is  <_  e then  a is  now  set  to 
be  c,  whereas  In  the  contrary  case  6 is  then  set  to  be  c.  In  this  manner 
the  set  [a,  a]  n X is  always  the  largest  current  set  on  which  we  can 
approximate  the  data  with  a pointwise  error  £ e and  [a,  &]HX  Is  the 
smallest  current  set  on  which  we  cannot  approximate  the  data  with  a point- 
wise  error  <_  e.  This  procedure  Is  continued  by  again  setting  c s ^-(a  + £>) 
and  continues  until  either  a and  & are  adjacent  points  of  X on  (&-a)<n, 
where  n Is  a preset  constant  in  the  code.  Finally,  the  algorithm  set 
t,  ■ a when  the  above  phase  terminates.  If  s = 0 so  that  only  continuity 
Is  required  on  [a,  b],  then  tj  is  taken  to  be  tj.  If  s > 0,  the  algorithm 
then  may  "back  off"  from  t-j  in  an  attempt  to  avoid  the  introduction  of 
unwanted  steep  oscillations  in  the  approximation.  This  "backing-off" 
procedure  is  probably  very  similar  to  that  briefly  mentioned  by  Rice  [19] 
for  his  1-pass  algorithm.  This  procedure  consists  of  examining,  the  error 
function  at  the  m = n - s - 1 largest  extreme  points,  C-j » ....  ^ of 
(a,  fc|]  O X where  ev  being  an  extreme  point  means  that  jfUv)-p-j Uv)  I 
£Slgn(f(ev)  - P1(Cv))*(f(x)  - p-,  (x) ) for  x = maxtt  6.  X:  t<$v)  and 
x ■ mlnlteX:  t>E  h Then,  t,  is  chosen  to  be  the  largest  s such  that 
|f'Uv)  - P^UV)|  is  less  than  a (user  definable)  prescribed  tolerance, 
where  Is  the  derivative  of  the  quadratic  interpolation  of  f centered 
at  £ If  there  does  not  exist  such  a 5 . then  t,  is  chosen  to  be  the 
largest  at  which  |f'(Ev)  - p-| (Cv) | is  a minimum.  Our  numerical  exper- 
ience Indicates  that  this  procedure  contributes  significantly  towards  the 
stability  of  the  algorithm.  The  original  motivation  for  this  procedure 
was  that  In  the  theoretical  case  of  approximating  a continuously  differen- 
tiable f on  an  interval  we  are  guaranteed  that  f'  and  pj  agree  at  Interior 
extreme  points.  Thus,  when  we  smoothly  join  the  next  polynomial  piece, 
p2.  of  our  piecewise  polynomial  p to  p-j  at  tj,  this  next  piece  will 
closely  follow  the  direction  of  f at  least  near  t-j. 

Once  the  first  knot  t-j  has  been  chosen,  the  algorithm  then  repeats  the 
above  steps  on  the  set  [tj,  b]  0 X provided  t^  < b with  the  additional 
requirement  that  each  best  approximation,  p2,  must  now  also  satisfy  the 
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Interpolate^  constraints  p^(tj)  a p{^(tj),  J ■ 0,  1,  ....  s.  The 
Inclusion  of  these  Interpolatory  constraints  into  the  approximation 
bperator  Is  straightforward  since  they  are  imposed  at  the  left-hand  end- 
point of  the  Interval  of  approximation.  Finally,  special  care  is  also 
taken  In  selecting  the  last  knot  tk-1  so  that  the  set  [tk_j,  b]0  X has 
sufficient  points  toallow  for  a reasonable  approximation  to  be  calculated 
on  It. 

* • 

One  other  common  feature  of  these  codes  Is  the  Inclusion  of  a subroutine 
that  will  add  artificial  data  points  to  a sparse  data  set.  As  mentioned 
earlier,  approximation  on  a small  (sparse)  data  set  can  lead  to  unaccep- 
table approximations.  Basically,  what  happens  Is  that  the  approximant 
fits  the  sparse  data  very  well  but  oscillates  badly  between  the  data 
points.  In  these  cases  we  have  found  that  the  addition  of  artificial 
data  points  will  sometimes  remove  this  undesirable  behavior.  This 
optional  subroutine  will  Insert  a continuous  piecewise  linear  function 
through  the  data  and  then  discretize  this  function  to  increase  the  size 
of  the  data  set.  Note  that  It  Is  the  user's  responsibility  to  decide  if 
piecewise  linearization  Is  really  the  best  way  to  add  artificial  data 
points  In  his  particular  situation. 

Based  upon  our  general  Intuition  of  these  various  approximation  operators, 
we  suspect*  that  the  uniform  adaptive  curve  fitting  package  should  be  used 
only  on  either  precise  data  (e.g.  from  mathematical  formulas)  or  on 
essentially  noise-free  data.  The  least  squares  adaptive  curve  fitting 
package  should  be  effective  for  fitting  data  sets  containing  essentially 
normally  distributed  random  noise.  The  uniform  restricted  range  adaptive 
curve  fitting  package  should  allow  the  user  the  greatest  control  over 
the  shape  of  the  resulting  fit  (at  a cost  of  having  to  understand  the 
code  to  a greater  depth).  Here,  the  user  can  actually  create  restraining 
bands  within  which  the  resulting  approximation  will  lie.  Finally,  the  z-j 
adaptive  curve  fitting  package  should  be  effective  for  fitting  large  data 
sets  that  contain  points  which  are  very  inaccurate  with  respect  to  the 
overall  accuracy  of  the  data.  Finally,  for  sets  suspected  of  having  an 
occasion  highly  Inaccurate  data  point,  we  have  used  a slightly  different 
error  tolerance  check.  This  second  error  tolerance  check  says  that  the 
error  tolerance  criterion  Is  not  met  when  the  polntwlse  absolute  error  Is 
greater  than  e at  two  consecutive  points  of  X. 
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In  the  way  of  examples  we  have  Included  seven  figures  Illustrating  the 
results  of  running  algorithms  (11 )-(1v).  Figures  1 through  5 represent 
the  results  of  running  these  algorithms  on  some  oil  shale  data.  Since 
relatively  few  data  points  (denoted  by  "x"  In  figures)  were  available,  we 
filled  In  the  gaps  using  our  linear  Interpolation  routine  with  200-300 
points.  Figures  2 and  3 are  the  same  approximation,  the  first  one  also 
plotting  the. restraining  curves.  Figures  6 and  7 are  examples  of  approxi- 
mating the  exponential  function  with  noise  on  101  equally  spaced  points 
In  [0,  2]  where  the  second  error  tolerance  check  is  used.  All  of  these 
runs  were  done  on  Colorado  State  University's  CDC  CYBER  172.  The  run  time 
was  roughly  the  same  for  the  approximation  code  and  the  restricted 
range  approximation  code  (taking  roughly  one-half  second  per  subinterval 
needed)  and  the  run  time  of  the  i-j  approximation  code  was  roughly  two  to 
three  times  greater. 

He  also  approximated  y'x  on  201  equally  spaced  points  in  [0,  2]  with 
algorithm  (1)  using  n * 6,  s s 2 and  e = .01.  A piecewise  polynomial  fit 
was  computed  in  1.97  seconds,  having  an  absolute  error  of  .0093  at  these 
points,  with  a total  of  5 polynomial  pieces.  The  knot  locations  were  0, 
.08,  .230,  .350,  .810,  2.0.  We  chose  this  example  to  check  the  ability  of 
the  code  to  recover  from  a point  where  f(x)  is  difficult  to  approximate 
(here  at  x * 0).  In  way  of  comparison  with  straight  least  squares  cubic 
spline  (fixed  knots)  curve  fitting  we  also  approximated  Hi  on  201  equally 
spaced  points  using  as  our  approximating  family  the  set  of  all  cubic 

■f 

splines  with  knots  at  and  the  routine  ICSFKU  in  the  IMSL  library. 

The  maximum  error  of  this  fit  over  the  201  points  in  [0,  2]  used  was  .032 
(at  x • 0,  the  error  improved  towards  2)  and  the  run  time  was  4.29 
seconds.  Using  this  same  routine  with  only  eleven  equally  spaced  knots 
gave  a maximum  error  of  .0597  in  a run  time  of  1.68  seconds.  Finally,  we 
also  attempted  to  use  the  variable  knot  cubic  spline  fitting  routine 
ICSVKU  of  the  IMSL  library  with  eleven  knots  on  this  problem.  However,  it 
failed  to  return  satisfactory  results  (maximum  error  14.79  in  a run  time 
of  13.2  seconds).  In  defense  of  this  routine  it  should  be  noted  that  its 
documentation  specifies  that  it  is  not  intended  for  use  in  approximating 
precise  mathematical  functions  and  we  initialized  it  with  equally  spaced 
knots. 


In  closing  this  section  we  wish  to  briefly  mention  three  other  adaptive 
curve  fitting  studies.  The  first  is  due  to  J.  A.  Payne  [15].  In  this 
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paper  an  automatic  curve  fitting  package  Is  described.  Basically  the 
knots  and  the  degrees  of  the  polynomial  pieces  of  the  piecewise  polynomial 
fit  are  selected  in  a left-to-rlght  manner  via  a statistical  criterion. 
Then  the  final  fit  is  found  by  doing  a simultaneously  least  squares  fit 
j of  all  the  Individual  pieces  on  the  whole  data  set. 

i — . 

In  [17],  J.  R.  Rice  has  developed  an  adaptive  numerical  software  package 
based  upon  local  Hermlte  interpolation  rather  than  approximation  on  each 
( subinterval  to  obtain  each  polynomial  piece.  This  algorithm  is  primarily 

j Intended  for  approximating  mathematical  functions  where  one  has  access  to 

i a certain  number  of  derivatives  at  each  point.  The  adaptive  strategy  is 

similar  to  ours  but  without  any  backing-off  procedure.  In  addition  one 
can  measure  the  error  of  approximation  in  any  preselected  Lp  norm  (p>J). 
This  code  will  return  a smooth  (user  specified)  piecewise  polynomial  and 
It  will  have  fast  run  times. 

t 

In  [12],  an  adaptive  numerical  software  curve  fitting  routine  that  com- 
putes C1  piecewise  cubic  polynomials  is  given.  This  routine  is  similar  to 
our  least  squares  routine  with  the  major  difference  being  in  the  "backing- 
off"  procedure  for  selecting  knots.  They  use  a simpler  procedure  based 
upon  Ideas  of  Powell.  Also,  it  should  be  noted  that  their  algorithm  is 
also  applicable  to  on-line  systems  since  a tolerance  need  not  be  pre- 
scribed In  advance.  ' , 
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Considerable  effort  has  been  (and  is  being)  expended  developing  methods 
of  approximation  of  multi-dimensional  data.  A recent  excellent  survey  by 
L.  L.  Schumaker  [21]  should  be  consulted  for  an  overall  view  of  the 
current  state  of  the  art  in  this  topic.  At  present,  according  to  J.  R. 
Rice  [20]  In  a talk  given  at  the  October  1 978-  SIAM  meetings,  there  are 
no  adaptive  numerical  software  packages  available  for  approximating 
multi-dimensional  data  sets.  In  this  section,  we  will  briefly  discuss  an 
adaptive  multi-dimensional  data  fitting  code  currently  being  developed 
by  C.  R.  Vogel  and  myself  [22].  We  hope  to  make  this  code  available  In 
the  late  spring.  Our  philosophy  has  been  to  develop  a reasonably  general 
code  sometimes  at  the  cost  of  efficiency.  Thus,  for  example,  the  user 
will  have  a reasonable  choice  of  possible  basis  functions  and  to  Increase 
this  selection  the  user  would  have  to  modify  the  code  Itself.  (We  will 
probably  Include  some  directions  for  doing  this).  Also,  the  code  will 
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probably  be  set  up  for  two-dimensional  data  with  Instructions  on  how  to 
modify  It  to  treat  larger  dimensional  data  sets.  In  addition,  it  will  be 
based  upon  the  least  squares  approximation  operator  only.  For  simplicity, 
we  shall  discuss  the  two-dimensional  case  in  what  follows. 

Thus,  given  a two-dimensional  scattered  data  set  contained  in  a rectangle 

R,  the  code  will  attempt  to  calculate  a piecewise  polynomial  of  user 

prescribed  degree  and  smoothness  that  fits  the  data  with  a least  squares 

error  less  than  or  equal  to  a user  prescribed  tolerance.  The  respective 

polynomial  pieces  will  be  defined  over  subrectangles  with  sides  parallel 

mm  ■}  -i 

to  the  sides  of  R with  each  piece  being  of  the  form  7 T a-^x  y*  . To 

1*0  j*0 

describe  our  adaptive  strategy,  assume  that  the  algorithm  has  subdivided  R 

k k 

Into  k subrectangles,  {R.L_,,  such  that  U R.-  = R and  R.,  O R.  is  at  most 

1 i i i*l  ‘ 1 J 

an  edge  for  1 f j.  Then  on  this  subdivision  one  solves  a least  squares 
approximation  problem  with  equality  constraints  to  find  simultaneously 
p-|»  ...»  Pk  with  p^  defined  on  R^,  that  gives  the  best  least  squares 
approximation  (of  minimal  least  squares  coefficient  sum  in  the  event  of 
nonuniqueness)  to  the  given  data  where  the  equality  constraints  corres- 
pond to  the  smoothness  required.  Setting  p equal  to  the  resulting  piece- 
wise  polynomial  approximation,  the  least  squares  norm  of  the  difference 
of  the  data  minus  the  value  of  p at  the  data  point  is  calculateo  over 
each  subrectangle.  Each  of  these  values  is  then  compared  to  the  user 
prescribed  tolerance  divided  by  vie.  If  for  a particular  subrectangle  this 
value  is  greater  than  the  prescribed  tolerance  divided  by  rlc”,  then  this 
rectangle  must  be  subdivided  on  the  next  pass  of  the  algorithm  (provided 
the  total  number  of  subrectangles  does  not  become  too  large).  This  rec- 
tangle is  then  divided  into  two  subrectangles  such  that  the  (modulo  a 
scaling  factor)  two  subrectangles  are  as  close  to  squares  as  possible. 
This,  then  forms  a new  collection  of  subrectangles  on  which  a new  least 
squares  fit  is  computed  as  above.  In  the  computation  of  these  least 
squares,  fits  we  are  using  a weighted  least  squares  approach  based  on 
Given's  rotations  as  described  in  [8]  rather  than  the  Lagrange  multiplier 
approach  of  [3].  At  present  we  have  a running  FORTRAN  code  that  corres- 
ponds to  the  above  brief  description.  Currently,  we  are  running  numerical 
tests  on  this  code  and  as  mentioned  earlier,  we  hope  to  make  this  code 
available  for  general  use  In  the  near  future. 
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